End-to-End 6DoF Pose Estimation From Monocular RGB Images

نویسندگان

چکیده

We present a conceptually simple framework for 6DoF object pose estimation, especially autonomous driving scenarios. Our approach can efficiently detect the traffic participants from monocular RGB image while simultaneously regressing their 3D translation and rotation vectors. The proposed method 6D-VNet, extends Mask R-CNN by adding customised heads predicting vehicle's finer class, translation. It is trained end-to-end compared to previous methods. Furthermore, we show that inclusion of translational regression in joint losses crucial estimation task, where distance along longitudinal axis varies significantly, e.g., Additionally, incorporate mutual information between via modified non-local block capture spatial dependencies among detected objects. As opposed original implementation, weighting modification takes neighbouring into consideration whilst counteracting effect extreme gradient values. evaluate our on challenging real-world Pascal3D+ dataset 6D-VNet reaches 1st place ApolloScape challenge Car Instance task (Apolloscape, 2018), (Huang et al., 2018).

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Full 6DOF Pose Estimation from Geo-Located Images

Estimating the external calibration – the pose – of a camera with respect to its environment is a fundamental task in Computer Vision (CV). In this paper, we propose a novel method for estimating the unknown 6DOF pose of a camera with known intrinsic parameters from epipolar geometry only. For a set of geo-located reference images, we assume the camera position but not the orientation to be kno...

متن کامل

Camera Pose Estimation in Unknown Environments using a Sequence of Wide-Baseline Monocular Images

In this paper, a feature-based technique for the camera pose estimation in a sequence of wide-baseline images has been proposed. Camera pose estimation is an important issue in many computer vision and robotics applications, such as, augmented reality and visual SLAM. The proposed method can track captured images taken by hand-held camera in room-sized workspaces with maximum scene depth of 3-4...

متن کامل

Human Pose Estimation from Monocular Images: A Comprehensive Survey

Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we...

متن کامل

Learning Monocular 3D Human Pose Estimation from Multi-view Images

Accurate 3D human pose estimation from single images is possible with sophisticated deep-net architectures that have been trained on very large datasets. However, this still leaves open the problem of capturing motions for which no such database exists. Manual annotation is tedious, slow, and error-prone. In this paper, we propose to replace most of the annotations by the use of multiple views,...

متن کامل

Time Consistent Estimation of End-Effectors from RGB-D Data

End-effectors are usually related to the location of the free end of a kinematic chain. Each of them contains rich structure information about the entity. Hence, estimating stable end-effectors of different entities enables robust tracking as well as a generic representation. In this paper, we present a system for end-effector estimation from RGB-D stream data. Instead of relying on a specific ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Consumer Electronics

سال: 2021

ISSN: ['1558-4127', '0098-3063']

DOI: https://doi.org/10.1109/tce.2021.3057137